4 research outputs found
DAB Join: A Distributed Adaptive and Balanced N-way Stream Window Join for Shared Nothing Clusters
Σε αυτή την δουλειά, θα παρουσιάσουμε έναν προσαρμοστικό αλγόριθμο για παραθυρική ζεύξη σε πολλαπλά ρεύματα δεδομένων χρησιμοποιώντας υπολογιστικό νέφος. Τα κύρια χαρακτηριστικά του αλγορίθμου είναι ότι (1) υποστηρίζει όλων τον ειδών τα κατηγορήματα, (2) μειώνει το κόστος μεταφοράς ενώ την ίδια στιγμή διανέμει ισότιμα τον φόρτο σε όλους τους κόμβους του υπολογιστικού νέφους, (3) χρησιμοποιεί μόνο ένα βήμα για να διανείμει τα δεδομένα και να εκτελέσει τη ζεύξη, αποφεύγοντας την διανομή των ενδιάμεσων δεδομένων τα οποία μπορεί να είναι τεράστια. Υλοποιήσαμε τον αλγόριθμο ζεύξης σε ένα πειραματικό σύστημα με το όνομα ExaStream. Το ExaStream υποστηρίζει την κατανεμημένη εκτέλεση δουλειών οι οποίες μπορούν να εκφραστούν σε ακυκλικούς γράφους από διεργασίες που έχουν οριστεί από τον χρήστη. Τα πειράματα έγιναν σε συνθετικά δεδομένα και δείχνουν ότι ο αλγόριθμός μας κλιμακώνει ενώ ταυτόχρονα προσαρμόζεται σε αλλαγές της ροής των εισερχόμενων ρευμάτων. Λόγω της προσαρμοστικότητας ο αλγόριθμός μας συμπεριφέρεται καλύτερα και δίνει καλύτερους χρόνους εκτέλεσις σε σύγκριση με μη προσαρμοστικές εναλλακτικές.In this work, we present DAB Join, an adaptive operator that enables scalable processing of Multiway Windowed Stream Joins using a shared nothing cluster. DAB join (1) supports any kind of join predicates, (2) minimizes the network cost while at the same time distributes the load equally to all cluster nodes, and (3) uses only one hop to distribute the data and execute the join, avoiding the distribution of intermediate results that may be very large. We have implemented DAB Join on top of an experimental system named Exastream, which supports the distributed execution of jobs expressed as DAGs of user defined operators. Based on synthetic streams, Our experimental results show that our algorithm is scalable. Additionally, DAB Join adapts to changes of stream input rates, which results in better execution times compared to non-adaptive alternatives
Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)
Real-time analytics that requires integration and aggregation of
heterogeneous and distributed streaming and static data is a typical task in
many industrial scenarios such as diagnostics of turbines in Siemens. OBDA
approach has a great potential to facilitate such tasks; however, it has a
number of limitations in dealing with analytics that restrict its use in
important industrial applications. Based on our experience with Siemens, we
argue that in order to overcome those limitations OBDA should be extended and
become analytics, source, and cost aware. In this work we propose such an
extension. In particular, we propose an ontology, mapping, and query language
for OBDA, where aggregate and other analytical functions are first class
citizens. Moreover, we develop query optimisation techniques that allow to
efficiently process analytical tasks over static and streaming data. We
implement our approach in a system and evaluate our system with Siemens turbine
data
Ontology-Based Integration of Streaming and Static Relational Data with Optique
Real-time processing of data coming from multiple heterogeneous data
streams and static databases is a typical task in many industrial
scenarios such as diagnostics of large machines. A complex diagnostic
task may require a collection of up to hundreds of queries over such
data. Although many of these queries retrieve data of the same kind,
such as temperature measurements, they access structurally different
data sources. In this work we show how Semantic Technologies implemented
in our system OPTIQUE can simplify such complex diagnostics by providing
an abstraction layer ontology that integrates heterogeneous data. In a
nutshell, OPTIQUE allows complex diagnostic tasks to be expressed with
just a few high-level semantic queries. The system can then
automatically enrich these queries, translate them into a collection
with a large number of low-level data queries, and finally optimise and
efficiently execute the collection in a heavily distributed environment.
We will demo the benefits of OPTIQUE on a real world scenario from
Siemens